School of Computer and Information Security, Guilin University of Electronic Technology,Guilin,China,541004
纸质出版日期:2024,
扫 描 看 全 文
Voice Fence Wall: User-optional voice privacy transmission[J]. 信息与智能学报(英文), 2024,2(2):116-129.
Li Luo, Yining Liu. Voice Fence Wall: User-optional voice privacy transmission[J]. Journal of Information and Intelligence, 2024,2(2):116-129.
Voice Fence Wall: User-optional voice privacy transmission[J]. 信息与智能学报(英文), 2024,2(2):116-129. DOI: 10.1016/j.jiixd.2023.12.002.
Li Luo, Yining Liu. Voice Fence Wall: User-optional voice privacy transmission[J]. Journal of Information and Intelligence, 2024,2(2):116-129. DOI: 10.1016/j.jiixd.2023.12.002.
Sensors are widely applied in the collection of voice data. Since many attributes of voice data are sensitive such as user emotions
identity
raw voice collection may lead serious privacy threat. In the past
traditional feature extraction obtains and encrypts voice features that are then transmitted to upstream servers. In order to avoid sensitive attribute disclosure
it is necessary to separate the sensitive attributes from non-sensitive attributes of voice data. Motivated by this
user-optional privacy transmission framework for voice data (called: Voice Fence Wall) is proposed. Firstly
we provide user-optional
which means users can choose the attributes (sensitive attributes) they want to be protected. Secondly
Voice Fence Wall utilizes minimum mutual information (MI) to reduce the correlation between sensitive and non-sensitive attributes
thereby separating these attributes. Finally
only the separated non-sensitive attributes are transmitted to the upstream server
the quality of voice services is satisfied without leaking sensitive attributes. To verify the reliability and practicability
three voice datasets are used to evaluate the model
the experiments demonstrate that Voice Fence Wall not only effectively separates attributes to resist attribute inference attacks
but also outperforms related work in terms of classification performance. Specifically
our framework achieves 89.84 % accuracy in sentiment recognition and 6.01 % equal error rate in voice authentication.
Voice collectionVoice Fence WallVoice privacyMutual information
M. Pham, Z. Li, J. WhitehillToward better speaker embeddings: Automated collection of speech samples from unknown distinct speakers Proceedings of the ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, Piscataway (2020), pp. 7089-7093
S. Zhang, X. Zhao, Q. TianSpontaneous speech emotion recognition using multiscale deep convolutional LSTM IEEE Transactions on Affective Computing, 13 (2) (2019), pp. 680-688
W. Lin, M.-W. Mak, N. Li, D. Su, D. YuA framework for adapting DNN speaker embedding across languages IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28 (2020), pp. 2810-2822
Q. Zheng, Z. Chen, H. Liu, Y. Lu, J. Li, T. LiuMSRANet: Learning discriminative embeddings for speaker verification via channel and spatial attention mechanism in alterable scenarios Expert Systems with Applications, 217 (2023), Article 119511
J.-X. Ye, X.-C. Wen, X.-Z. Wang, Y. Xu, Y. Luo, C.-L. Wu, L.-Y. Chen, K.-H. Liu, GM-TCNetGated multi-scale temporal convolutional network using emotion causality for speech emotion recognition Speech Communication, 145 (2022), pp. 21-35
S. Kwon MustaqeemOptimal feature selection based speech emotion recognition using two-stream deep convolutional neural network International Journal of Intelligent Systems, 36 (9) (2021), pp. 5116-5135
I. Martínez-Nicolás, T.E. Llorente, F. Martínez-Sánchez, J.J.G. MeilánTen years of research on automatic voice and speech analysis of people with Alzheimer's disease and mild cognitive impairment: A systematic review article Frontiers in Psychology, 12 (2021), Article 620251
A. Koenecke, A. Nam, E. Lake, J. Nudell, M. Quartey, Z. Mengesha, C. Toups, J.R. Rickford, D. Jurafsky, S. GoelRacial disparities in automated speech recognition Proceedings of the National Academy of Sciences, 117 (14) (2020), pp. 7684-7689
Principles relating to processing of personal data (2018) https://gdpr-info.eu/art-5-gdpr/
S.-F. Huang, C.-J. Lin, D.-R. Liu, Y.-C. Chen, H.-y. LeeMeta-TTS: Meta-learning for few-shot speaker adaptive text-to-speech IEEE/ACM Transactions on Audio, Speech, and Language Processing, 30 (2022), pp. 1558-1571
N. Kaur, P. SinghConventional and contemporary approaches used in text to speech synthesis: A review Artificial Intelligence Review, 56 (7) (2023), pp. 5837-5880
R. Lai, X. Fang, P. Zheng, H. Liu, W. Lu, W. LuoEfficient fragile privacy-preserving audio watermarking using homomorphic encryption Proceedings of the International Conference on Artificial Intelligence and Security (ICAIS), Springer, Cham (2022), pp. 373-385
S.-X. Zhang, Y. Gong, D. YuEncrypted speech recognition using deep polynomial networks Proceedings of the ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, Piscataway (2019), pp. 5691-5695
A. Nautsch, A. Jiménez, A. Treiber, J. Kolberg, C. Jasserand, E. Kindt, H. Delgado, M. Todisco, M.A. Hmani, A. Mtibaa, et al.Preserving privacy in speaker and speech characterisation Computer Speech & Language, 58 (2019), pp. 441-480
S.A. Osia, A.S. Shamsabadi, S. Sajadmanesh, A. Taheri, K. Katevas, H.R. Rabiee, N.D. Lane, H. HaddadiA hybrid deep learning architecture for privacy-preserving mobile analytics IEEE Internet of Things Journal, 7 (5) (2020), pp. 4505-4518
R. Aloufi, H. Haddadi, D. BoylePrivacy-preserving voice analysis via disentangled representations Proceedings of the 2020 ACM SIGSAC Conference on Cloud Computing Security Workshop, ACM, New York (2020), pp. 1-14
H. Kameoka, T. Kaneko, K. Tanaka, N. HojoNonparallel voice conversion with augmented classifier star generative adversarial networks IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28 (2020), pp. 2982-2995
M. Chu, M. Yang, C. Xu, Y. Ma, J. Wang, Z. Fan, Z. Tao, D. Wu, E-DGANAn encoder-decoder generative adversarial network based method for pathological to normal voice conversion IEEE Journal of Biomedical and Health Informatics, 27 (5) (2023), pp. 2489-2500
A. Li, Y. Duan, H. Yang, Y. Chen, J. YangTIPRDC: Task-independent privacy-respecting data crowdsourcing framework for deep learning with anonymized intermediate representations Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, ACM, New York (2020), pp. 824-832
J. Deng, F. Teng, Y. Chen, X. Chen, Z. Wang, W. Xu, V-CloakIntelligibility-, naturalness-& timbre-preserving real-time voice anonymization USENIX (2023) https://www.usenix.org/conference/usenixsecurity23/presentation/deng-jiangyi-v-cloak
W. Lu, X. Zhao, N. Guo, Y. Li, J. Wei, J. Tao, J. DangOne-shot emotional voice conversion based on feature separation Speech Communication, 143 (2022), pp. 1-9
M. Feng, C.-C. Kao, Q. Tang, M. Sun, V. Rozgic, S. Matsoukas, C. WangFederated self-supervised learning for acoustic event classification Proceedings of the ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, Piscataway (2022), pp. 481-485
Y. Han, S. Li, Y. Cao, Q. Ma, M. YoshikawaVoice-indistinguishability: Protecting voiceprint in privacy-preserving speech data release Proceedings of the 2020 IEEE International Conference on Multimedia and Expo (ICME), IEEE, Piscataway (2020), pp. 1-6
A. Nelus, R. MartinPrivacy-preserving audio classification using variational information feature extraction IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29 (2021), pp. 2864-2877
P. Huang, Y. Wei, P. Cheng, Z. Ba, L. Lu, F. Lin, F. Zhang, K. Ren, InfoMasker: Preventing eavesdropping using phoneme-based noise, in: Proceedings of the 2023 Network and Distributed System Security (NDSS) Symposium, 2023, pp. 1–16.
F. Gontier, M. Lagrange, C. Lavandier, J.-F. PetiotPrivacy aware acoustic scene synthesis using deep spectral feature inversion Proceedings of the ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, Piscataway (2020), pp. 886-890
Z. Ma, Y. Liu, X. Liu, J. Ma, F. LiPrivacy-preserving outsourced speech recognition for smart IoT devices IEEE Internet of Things Journal, 6 (5) (2019), pp. 8406-8420
D. Wang, L. Deng, Y. T. Yeung, X. Chen, X. Liu, H. Meng, VQMIVC: Vector quantization and mutual information-based unsupervised speech representation disentanglement for one-shot voice conversion, in: Proceedings of the Interspeech 2021, pp. 1344-1348.
R. Aloufi, H. Haddadi, D. BoyleEmotionless: Privacy-preserving speech analysis for voice assistants Proceedings of the Privacy Preserving in Machine Learning (CCS19) Workshop (2019)
J.M. Perero-Codosero, F.M. Espinoza-Cuadros, L.A. Hernández-GómezX-vector anonymization using autoencoders and adversarial training for preserving speech privacy Computer Speech & Language, 74 (2022), Article 101351
P. Cheng, W. Hao, S. Dai, J. Liu, Z. Gan, L. CarinCLUB: A contrastive log-ratio upper bound of mutual information Proceedings of the International Conference on Machine Learning, PMLR (2020), pp. 1779-1788
S.R. Livingstone, F.A. RussoThe Ryerson audio-visual database of emotional speech and song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English PLOS ONE, 13 (5) (2018), Article e0196391
A. Nagrani, J.S. Chung, A. ZissermanVoxCeleb: A large-scale speaker identification dataset Proceedings of Interspeech 2017 (2017), pp. 2616-2620
C. Busso, M. Bulut, C.-C. Lee, A. Kazemzadeh, E. Mower, S. Kim, J.N. Chang, S. Lee, S.S. NarayananIEMOCAP: Interactive emotional dyadic motion capture database Language Resources and Evaluation, 42 (2008), pp. 335-359
R. Li, Z. Wu, J. Jia, S. Zhao, H. MengDilated residual network with multi-head self-attention for speech emotion recognition Proceedings of the ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, Piscataway (2019), pp. 6675-6679
T. Tuncer, S. Dogan, U.R. AcharyaAutomated accurate speech emotion recognition system using twine shuffle pattern and iterative neighborhood component analysis techniques Knowledge-Based Systems, 211 (2021), Article 106547
Y. Zhong, Y. Hu, H. Huang, W. SilamuA lightweight model based on separable convolution for speech emotion recognition Proceedings of Interspeech 2020 (2020), pp. 3331-3335
Z. Peng, Y. Lu, S. Pan, Y. LiuEfficient speech emotion recognition using multi-scale CNN and attention Proceedings of the ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, Piscataway (2021), pp. 3020-3024
X.-C. Wen, J.-X. Ye, Y. Luo, Y. Xu, X.-Z. Wang, C.-L. Wu, K.-H. Liu, CTL-MTNetA novel CapsNet and transfer learning-based mixed task net for the single-corpus and cross-corpus speech emotion recognition Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence (2022), pp. 2305-2311
A. Aftab, A. Morsali, S. Ghaemmaghami, B. ChampagneLight-SERNet: A lightweight fully convolutional neural network for speech emotion recognition Proceedings of the ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, Piscataway (2022), pp. 6912-6916
J. Ye, X. Wen, Y. Wei, Y. Xu, K. Liu, H. ShanTemporal modeling matters: A novel temporal emotional modeling approach for speech emotion recognition Proceedings of the ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE, Piscataway (2022), pp. 1-5
J. Yu, G. Zhou, S. Zhou, J. YinA lightweight fully convolutional neural network for SAR automatic target recognition Remote Sensing, 13 (15) (2021), p. 3029
S. Schneider, A. Baevski, R. Collobert, M. Auliwav2vec: Unsupervised pre-training for speech recognition Proceedings of Interspeech 2019 (2019), pp. 3465-3469
0
浏览量
0
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构